The recordExtractor action is a function that takes an object as an input parameter and returns a list of records.

JavaScript
recordExtractor: ({ url, $, contentLength, fileType})  => {
return [
    {
    url: url.href,
    text: $('p').html()
    ... /* Anything you want */
    }
];
// return []; skips the page
}

Parameters

Specify one or more response parameters in your recordExtractor to determine what information is returned.

$
object

A Cheerio instance with the HTML of the crawled page. For more information, see Extracting data with Cheerio.

contentLength
number

The size of the crawled page in bytes.

dataSources
object

The external data sources of the current URL. Each key of this object corresponds to an externalData object. For example:

JavaScript
{
  dataSources: {
    dataSourceId1: { data1: 'val1', data2: 'val2' },
    dataSourceId2: { data1: 'val1', data2: 'val2' },
  }
}
filetype
string

The file type of the crawled page or document.

helpers
function

Helpers are functions that help extract content and generate records. This can help simplify your record extractor.

url
object

A Location object that contains the URL.

Returns

The record extractor returns an array of records with attributes or an empty array. If it returns an empty array, the page is skipped (isn’t crawled).